Esta página contiene el código para generar análisis de redes personales (ego networks) en Twitter.

Set up

library(rtweet)
source("createTokens.R")  ## keys y tokens privados
source("rtweet_functions.R") ## funciones para trabajar con múltiples tokens

library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
library(ggwordcloud)
library(tidytext)
theme_set(theme_custom())

El primer paso consiste en escoger un usuario focal (o “ego”) a partir del cual construímos una red personal.

ego <- "Danielramirezzr" # Daniel Ramírez
ego_info <- lookup_users(ego, token = sample(token, 1))

ego_info$followers_count
## [1] 3197

Nombre: DanielX

Usuario: Danielramirezzr

Seguidores: 3197

Amigos: 570

Se unió a Twitter en 2015-03-14 15:58:27

Este análisis está dividido en tres partes.

  1. La red de seguidores del usuario focal
  2. La red de “amigos” del ususario focal
  3. La red de amigos-seguidores del usuario focal

Cada una de estas tres dimensiones corresponde a flujos de interacción diferentes. La primera consiste de los usuarios que reciben información de Danielramirezzr, la segunda son los usuarios que generan la información recibida por Danielramirezzr, y la tercera consiste en los usuarios donde el flujo de información es recíproco.

Este código es de acceso libre excepto por los keys y tokens privados que se consiguen abriendo una cuenta de desarrollador en https://developer.twitter.com/

Red de seguidores

El siguiente código extrae la lista de seguidores de Danielramirezzr (cada uno identificado con un user_id).

ego_followers <- get_followers(ego, token = sample(token, 1))
ego_followers
## # A tibble: 3,197 x 1
##    user_id            
##    <chr>              
##  1 2514365882         
##  2 119181290          
##  3 1326656879821545474
##  4 1730577278         
##  5 1220145256848666626
##  6 52419142           
##  7 1325594948591374338
##  8 870698812884492289 
##  9 1319759601861120007
## 10 1315994406156218368
## # … with 3,187 more rows

Este user_id es exclusivo a cada cuenta, incluso cuando el usuario decide cambiar su nombre.

El siguiente código crea una carpeta llamada *_friends_of_followers/ donde queda archivado la lista de los seguidores de cada uno de estos usuarios.

Dependiendo del número de usuarios y el número de Tokens, esto puede llegar a demorarse varias horas (o incluso días).

outfolder <- paste0(ego, "_friends_of_followers/")
if (!dir.exists(outfolder)) dir.create(outfolder)
users_done <- str_replace(dir(outfolder), ".rds", "")
users_left <- setdiff(ego_followers$user_id, users_done)

while (length(users_left) > 0) { 
  
  new_user <- users_left[[1]]
  
  friends_of_user <- try(multi_get_friends(new_user, token))
  
  file_name <- str_glue("{outfolder}{new_user}.rds")
  write_rds(friends_of_user, file_name, compress = "gz")
  users_left <- users_left[-which(users_left %in% new_user)] ## int. subset
  
}

Para algunos usuarios esta información es imposible de conseguir porque son cuentas protegidas.

En este caso, no se puede obtener información sobre el 16.9% de los sequidores de Danielramirezzr.

Edge list

Para construir la red, tomamos toda la lista de usuarios y sus amigos y los organizamos en dos columnas, donde cada fila indica un usario (from) siguiendo a otro usario (to).

edge_list <- list.files(outfolder, full.names = TRUE) %>% 
  map(read_rds)
  
edge_list <- edge_list[-error_index] %>% 
  bind_rows() 

edge_list
## # A tibble: 4,287,491 x 2
##    from      to                 
##    <chr>     <chr>              
##  1 100049987 1557085998         
##  2 100049987 249409369          
##  3 100049987 56408044           
##  4 100049987 1074674327646339072
##  5 100049987 1244686639558840320
##  6 100049987 1205685150         
##  7 100049987 1280251338         
##  8 100049987 741375844849979392 
##  9 100049987 1104449323012685824
## 10 100049987 1011948205         
## # … with 4,287,481 more rows

Aquí hay 4,287,491 conexiones. Sin embargo, aquí están incluídos conexiones on usuarios más allá de los que siguen a Danielramirezzr.

ego_followers_info <- lookup_users(ego_followers$user_id, token = sample(token), 1)
write_rds(ego_followers_info, paste0(ego, "_follower_info.rds"), compress = "gz")

También podemos conseguir metadatos sobre cada usuario.

ego_followers_info <- read_rds(paste0(ego, "_follower_info.rds")) %>% 
  filter(!protected) %>% 
  select(
    user_id, screen_name, lang, name, location, description,
    ends_with("count"), -starts_with("quote"), 
    -starts_with("retweet"), -reply_count,
    -starts_with("fav")
    ) %>% 
    rename(name = user_id, user_name = name)

id_dict <- ego_followers_info %>% 
  select(name, screen_name) %>% 
  deframe()

Por ejemplo, esta es la información que corresponde a los seguidores de Danielramirezzr con mayor número de seguidores.

ego_followers_info %>% 
  arrange(desc(followers_count)) %>% 
  select(screen_name, description, location, followers_count, friends_count)
## # A tibble: 2,634 x 5
##    screen_name  description            location    followers_count friends_count
##    <chr>        <chr>                  <chr>                 <int>         <int>
##  1 Brn634       "Official creator of … "Medellín,…           75537          1864
##  2 gerardbermon "Aunque me encanta ir… "Medellín,…           73278          1491
##  3 HerranzJC    "¡Sígueme en YouTube!… "Madrid"              57788          6721
##  4 gergeriin    "Peludo & Pachón \U00… "CDMX "               56324          8305
##  5 SebastianG0… "Sólo respondo DM cua… ""                    53718          1072
##  6 AndresCamil… "Jefe de Comunicacion… "Bogotá, D…           45249         20320
##  7 MMMaldonadoC "Dimensión jurídica d… "Bogotá"              43383          2434
##  8 AndressVerg… "NO apto para menores… "Cali, Col…           37658          1117
##  9 SoyElTorito… "🚫PERFIL XXX ☢🔞\nCERO… "Cuauhtémo…           36463         14750
## 10 Csuberxx     "En cualquier momento… "Barranqui…           34931          2969
## # … with 2,624 more rows

Finalmente nos interesa la red personal de seguidores de Danielramirezzr, por lo cual eliminamos las conexiones entre usuarios que se encuentran por fuera de sus 3197

edge_list <- edge_list %>% 
  filter(to %in% ego_followers_info$name) %>% 
  filter(from %in% ego_followers_info$name)

edge_list
## # A tibble: 89,811 x 2
##    from      to                 
##    <chr>     <chr>              
##  1 100049987 2189410279         
##  2 100049987 136085801          
##  3 100049987 2209542892         
##  4 100049987 152714308          
##  5 100049987 133387904          
##  6 100049987 88534750           
##  7 100049987 1030279999889309697
##  8 100049987 180151491          
##  9 100049987 114499505          
## 10 100049987 1242512791         
## # … with 89,801 more rows

La red personal de seguidores de Danielramirezzr que pudimos reconstruir tiene 2634 usuarios con 89811 conexiones.

Red Personal

ego_network <- edge_list %>% 
  tidygraph::as_tbl_graph() %>% 
  left_join(ego_followers_info) %>% 
  rename(name = screen_name, user_id = name) %>% 
  select(name, everything())

ego_network
## # A tbl_graph: 2563 nodes and 89811 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 2,563 x 10 (active)
##   name  user_id lang  user_name location description followers_count
##   <chr> <chr>   <chr> <chr>     <chr>    <chr>                 <int>
## 1 Maur… 100049… en    Mauricio  Cartage… "Costeño. …             197
## 2 Jair… 100051… und   JairoEst… AXM - C… "Más de gu…              88
## 3 Abal… 100087… es    Juliana … Colombia "Escribo p…             148
## 4 juan… 100246… und   Felipe G… Villeta… ""                      132
## 5 leos… 100277… und   leo sbro… Bogotá,… "https://t…             482
## 6 Pipe… 100396… es    Felipe M… Bogotá,… "Médico - …            1544
## # … with 2,557 more rows, and 3 more variables: friends_count <int>,
## #   listed_count <int>, statuses_count <int>
## #
## # Edge Data: 89,811 x 2
##    from    to
##   <int> <int>
## 1     1  1244
## 2     1   943
## 3     1  1254
## # … with 89,808 more rows
## Estadísticas descriptivas

ego_network <- ego_network %>% 
  mutate(
    out_degree = centrality_degree(mode = "out"),
    in_degree = centrality_degree(mode = "in"),
    betweenness = centrality_betweenness(directed = TRUE),
    authority_score = centrality_authority(),
    eigen_centrality = centrality_eigen(directed = TRUE)
  )

ego_network
## # A tbl_graph: 2563 nodes and 89811 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 2,563 x 15 (active)
##   name  user_id lang  user_name location description followers_count
##   <chr> <chr>   <chr> <chr>     <chr>    <chr>                 <int>
## 1 Maur… 100049… en    Mauricio  Cartage… "Costeño. …             197
## 2 Jair… 100051… und   JairoEst… AXM - C… "Más de gu…              88
## 3 Abal… 100087… es    Juliana … Colombia "Escribo p…             148
## 4 juan… 100246… und   Felipe G… Villeta… ""                      132
## 5 leos… 100277… und   leo sbro… Bogotá,… "https://t…             482
## 6 Pipe… 100396… es    Felipe M… Bogotá,… "Médico - …            1544
## # … with 2,557 more rows, and 8 more variables: friends_count <int>,
## #   listed_count <int>, statuses_count <int>, out_degree <dbl>,
## #   in_degree <dbl>, betweenness <dbl>, authority_score <dbl>,
## #   eigen_centrality <dbl>
## #
## # Edge Data: 89,811 x 2
##    from    to
##   <int> <int>
## 1     1  1244
## 2     1   943
## 3     1  1254
## # … with 89,808 more rows

La siguiente gráfica muestra la influencia de cada usuario en Twitter (eje horizontal) vs la influencia de cada usuario dentro de la red personal de seguidores (eje vertical)

ego_network %>% 
  as_tibble() %>% 
  #filter(in_degree > 5) %>% 
  ggplot(aes(followers_count, in_degree)) + 
  geom_point() 

ego_network %>% 
  as_tibble() %>% 
  mutate(label_name = ifelse(
    test = rank(-followers_count) <= 10 | rank(-in_degree) <= 10, 
    yes = name, 
    no = NA_character_)
    ) %>% 
  ggplot(aes(followers_count, in_degree)) + 
  geom_point() + 
  ggrepel::geom_label_repel(aes(label = label_name), size = 3)

Clusters

set.seed(123)
clusters <- igraph::cluster_walktrap(graph = ego_network, steps = 7)

cluster_df <- tibble(cluster = factor(clusters$membership), name = clusters$names) 
  
cluster_df <- cluster_df %>% 
  group_by(cluster) %>% 
  filter(n() >= 10) %>% 
  ungroup()
ego_network <- ego_network %>% 
  left_join(cluster_df)

ego_network %>% 
  as_tibble() %>% 
  arrange(desc(in_degree)) %>% 
  filter(!is.na(cluster)) %>% 
  group_by(cluster) %>%
  filter(rank(-authority_score) <= 30) %>% 
  ggplot(aes(label = name, size = log(in_degree), color = in_degree)) + 
  geom_text_wordcloud_area(family = "Avenir Next Condensed") + 
  facet_wrap(~cluster) + 
  labs(title = "Seguidores prominentes en cada cluster") + 
  scale_color_gradient(low = "grey", high = "purple") 

Tamaño de cada cluster:

ego_network %>% as_tibble() %>% count(cluster)
## # A tibble: 7 x 2
##   cluster     n
##   <fct>   <int>
## 1 1         720
## 2 2         295
## 3 5          50
## 4 9         144
## 5 11        304
## 6 16        921
## 7 <NA>      129

¿Quiénes son los usuarios que funcionan como “puentes”?

ego_network %>% 
  as_tibble() %>% 
  arrange(desc(betweenness)) %>% 
  select(name, description, location)
## # A tibble: 2,563 x 3
##    name        description                                      location        
##    <chr>       <chr>                                            <chr>           
##  1 Elbuhonejo  "Contador pero no de chistes, Cinéfilo, Viajero… "Bogotá, D.C., …
##  2 AndresCami… "Jefe de Comunicaciones y Prensa del Senador @p… "Bogotá, D.C., …
##  3 MaoCelisCa  "Tech Lover | MTB"                               "Colombia"      
##  4 ALEJOMICHE… "Abogado/Activista DDHH, trabajando por la Prop… "Bogota Colombi…
##  5 netchmusic  "Cantautor.  \n\nHacer canciones es lo único qu… "En la Luna "   
##  6 DonDanielin "Más ficción que realidad."                      "Colombia"      
##  7 JairoSoto   "Yo no miento, exagero. Periodista y barranquil… "Bogotá, D.C., …
##  8 Juandam_m   "Manizalita de acento rolo. Ex-gordo. Inventado… "Colombia"      
##  9 javro       ""                                               "Bogotá, D.C., …
## 10 Elbayabuyi… "Con alma de gordo, intenso, de Sogamoso para e… "Bogotá"        
## # … with 2,553 more rows
cols <- c("betweenness", "in_degree", "out_degree", "followers_count", "friends_count")

ego_network %>% 
  as_tibble() %>% 
  group_by(cluster) %>% 
  summarize(across(all_of(cols), mean)) %>% 
  arrange(desc(betweenness))
## # A tibble: 7 x 6
##   cluster betweenness in_degree out_degree followers_count friends_count
##   <fct>         <dbl>     <dbl>      <dbl>           <dbl>         <dbl>
## 1 11            5076.     29.1       29.2            1444.         1381.
## 2 16            4929.     63.1       53.8            1216.         1135.
## 3 1             3632.     22.1       33.0            1329.         2232.
## 4 9             2155.     17.6       15.6            1883.         1920.
## 5 2             1762.     12.1       14.8            1709.         2379.
## 6 5             1392.     10.0       13.1             950.         1547.
## 7 <NA>          1055.      2.05       2.64            341.          645.

Subset

Dada la información anterior podemos enfocarnos en segmentos particulares de la red personal.

Por ejemplo, podemos enfocarnos exclusivamente en los usuarios que hacen parte de los grupos etiquetados con 9 y 11.

ego_network_subset <- ego_network %>% 
  filter(cluster %in% c(9, 11)) %>% 
    mutate(
    out_degree = centrality_degree(mode = "out"),
    in_degree = centrality_degree(mode = "in"),
    betweenness = centrality_betweenness(directed = TRUE),
    authority_score = centrality_authority(),
    eigen_centrality = centrality_eigen(directed = TRUE)
  ) 

ego_network_subset %>% 
  ggraph("mds") +
  geom_edge_fan(alpha = 1/5, width = 1/5) + 
  geom_node_point(aes(fill = cluster, size = in_degree), 
                  shape = 21, color = "white", show.legend = FALSE) 

ego_network_subset %>% 
  as_tibble() %>% 
  mutate(label_id = ifelse(
    test = rank(-betweenness) <= 10 |rank(-in_degree) <= 10, 
    yes = name, 
    no = NA_character_)
    ) %>% 
  ggplot(aes(betweenness, in_degree, color = cluster)) +
  geom_point() +
  ggrepel::geom_label_repel(aes(label = label_id), size = 3)

ego_network_subset %>% 
  ggraph("mds") +
  geom_edge_fan(alpha = 1/5, width = 1/5) + 
  geom_node_point(aes(fill = cluster, size = authority_score), 
                  shape = 21, color = "white", show.legend = FALSE) +
  geom_node_label(aes(filter = rank(-authority_score) <= 10, 
                      label = name), 
                  repel = TRUE, alpha = 3/4, size = 3) 

Red de amigos

Esta sección repite el análisis anterior para la red personal de amigos de Danielramirezzr

outfolder <- paste0(ego, "_friends_of_friends/")
if (!dir.exists(outfolder)) dir.create(outfolder)

ego_friends <- get_friends(ego, token = sample(token, 1))
ego_friends
## # A tibble: 570 x 2
##    user            user_id   
##    <chr>           <chr>     
##  1 Danielramirezzr 1376071800
##  2 Danielramirezzr 289506642 
##  3 Danielramirezzr 555188075 
##  4 Danielramirezzr 57666196  
##  5 Danielramirezzr 322011448 
##  6 Danielramirezzr 576624661 
##  7 Danielramirezzr 1730577278
##  8 Danielramirezzr 174443391 
##  9 Danielramirezzr 588952390 
## 10 Danielramirezzr 142171297 
## # … with 560 more rows
users_done <- str_replace(dir(outfolder), ".rds", "")
users_left <- setdiff(ego_friends$user_id, users_done)

while (length(users_left) > 0) { 
  
  new_user <- users_left[[1]]
  
  friends_of_user <- try(multi_get_friends(new_user, token))
  
  file_name <- str_glue("{outfolder}{new_user}.rds")
  write_rds(friends_of_user, file_name, compress = "gz")
  users_left <- users_left[-which(users_left %in% new_user)] ## int. subset
  
}

En este caso, no se puede obtener información sobre el 0.5% de los amigos de Danielramirezzr.

Edge list

edge_list <- list.files(outfolder, full.names = TRUE) %>% 
  map(read_rds)
  
edge_list <- edge_list[-error_index] %>% bind_rows()

edge_list
## # A tibble: 805,134 x 2
##    from       to                 
##    <chr>      <chr>              
##  1 1000511132 919976669213020160 
##  2 1000511132 752529589755404288 
##  3 1000511132 1109856955101790209
##  4 1000511132 95033445           
##  5 1000511132 48396652           
##  6 1000511132 39955069           
##  7 1000511132 1113640592842600448
##  8 1000511132 161106995          
##  9 1000511132 2400080066         
## 10 1000511132 3001776580         
## # … with 805,124 more rows
ego_friends_info <- lookup_users(ego_friends$user_id, token = token)
write_rds(ego_friends_info, paste0(ego, "_friends_info.rds"), compress = "gz")
ego_friends_info <- read_rds(paste0(ego, "_friends_info.rds")) %>% 
  filter(!protected) %>% 
  select(
    user_id, screen_name, lang, name, location, description,
    ends_with("count"), -starts_with("quote"), 
    -starts_with("retweet"), -reply_count,
    -starts_with("fav")
    ) %>% 
    rename(name = user_id, user_name = name)

id_dict <- ego_friends_info %>% 
  select(name, screen_name) %>% 
  deframe()

Esta es la información que corresponde a los amigos de Danielramirezzr con mayor número de seguidores.

ego_friends_info %>% 
  arrange(desc(followers_count)) %>% 
  select(screen_name, description, location, followers_count, friends_count)
## # A tibble: 552 x 5
##    screen_name  description             location   followers_count friends_count
##    <chr>        <chr>                   <chr>                <int>         <int>
##  1 elespectador "Noticias de Colombia … Bogotá, C…         5412852         51738
##  2 RevistaSema… "Periodismo con caráct… Colombia           4561113            45
##  3 bbcmundo     "Twitter oficial de BB… Londres, …         4176206           420
##  4 petrogustavo "Perfil Oficial del di… ÜT: 4.650…         4008247          2461
##  5 Citytv       "Información de Colomb… Bogotá             3063947          3537
##  6 ClaudiaLopez "Primera Alcaldesa de … Bogotá, D…         2471874          2508
##  7 UN_Women     "UN Women is the UN en… Worldwide          1929842          4209
##  8 Bogota       "Twitter oficial de la… Bogotá, C…         1675261          2741
##  9 SectorMovil… "Información oficial d… Bogotá, C…         1451865           643
## 10 MJDuzan      "Periodista"            Colombia           1114841          4529
## # … with 542 more rows
edge_list <- edge_list %>% 
  filter(to %in% ego_friends_info$name) %>% 
  filter(from %in% ego_friends_info$name)

edge_list
## # A tibble: 26,762 x 2
##    from       to        
##    <chr>      <chr>     
##  1 1000511132 161106995 
##  2 1000511132 588952390 
##  3 1000511132 2216042148
##  4 1000511132 180151491 
##  5 1000511132 86995313  
##  6 1000511132 229837949 
##  7 1000511132 39431290  
##  8 1000511132 78920906  
##  9 1000511132 10012122  
## 10 1000511132 127988585 
## # … with 26,752 more rows

La red personal de seguidores de Danielramirezzr que pudimos reconstruir tiene 552 usuarios con 26762 conexiones.

Red Personal

ego_network <- edge_list %>% 
  tidygraph::as_tbl_graph() %>% 
  left_join(ego_friends_info) %>% 
  rename(name = screen_name, user_id = name) %>% 
  select(name, everything())

ego_network
## # A tbl_graph: 551 nodes and 26762 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 551 x 10 (active)
##   name  user_id lang  user_name location description followers_count
##   <chr> <chr>   <chr> <chr>     <chr>    <chr>                 <int>
## 1 Jair… 100051… und   JairoEst… AXM - C… "Más de gu…              88
## 2 bbcm… 100121… es    BBC News… Londres… "Twitter o…         4176206
## 3 Nava… 100403… es    Esteban … Bogotá,… "Gender & …             584
## 4 Brig… 101486… es    Brigitte… Bogotá,… "Naturalme…          115467
## 5 elbi… 101498… es    A.        Miami, … ""                       11
## 6 Marc… 101511… es    𝕞𝕒𝕣𝕔𝕚𝕒𝕟𝕒… Bogotá,… "𝐏𝐮𝐭𝐚 𝐯𝐢𝐫𝐭…           29861
## # … with 545 more rows, and 3 more variables: friends_count <int>,
## #   listed_count <int>, statuses_count <int>
## #
## # Edge Data: 26,762 x 2
##    from    to
##   <int> <int>
## 1     1   153
## 2     1   409
## 3     1   213
## # … with 26,759 more rows
## Estadísticas descriptivas

ego_network <- ego_network %>% 
  mutate(
    out_degree = centrality_degree(mode = "out"),
    in_degree = centrality_degree(mode = "in"),
    betweenness = centrality_betweenness(directed = TRUE),
    authority_score = centrality_authority(),
    eigen_centrality = centrality_eigen(directed = TRUE)
  )

ego_network
## # A tbl_graph: 551 nodes and 26762 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 551 x 15 (active)
##   name  user_id lang  user_name location description followers_count
##   <chr> <chr>   <chr> <chr>     <chr>    <chr>                 <int>
## 1 Jair… 100051… und   JairoEst… AXM - C… "Más de gu…              88
## 2 bbcm… 100121… es    BBC News… Londres… "Twitter o…         4176206
## 3 Nava… 100403… es    Esteban … Bogotá,… "Gender & …             584
## 4 Brig… 101486… es    Brigitte… Bogotá,… "Naturalme…          115467
## 5 elbi… 101498… es    A.        Miami, … ""                       11
## 6 Marc… 101511… es    𝕞𝕒𝕣𝕔𝕚𝕒𝕟𝕒… Bogotá,… "𝐏𝐮𝐭𝐚 𝐯𝐢𝐫𝐭…           29861
## # … with 545 more rows, and 8 more variables: friends_count <int>,
## #   listed_count <int>, statuses_count <int>, out_degree <dbl>,
## #   in_degree <dbl>, betweenness <dbl>, authority_score <dbl>,
## #   eigen_centrality <dbl>
## #
## # Edge Data: 26,762 x 2
##    from    to
##   <int> <int>
## 1     1   153
## 2     1   409
## 3     1   213
## # … with 26,759 more rows

La siguiente gráfica muestra la influencia de cada usuario en Twitter (eje horizontal) vs la influencia de cada usuario dentro de la red personal de amigos (eje vertical)

ego_network %>% 
  as_tibble() %>% 
  #filter(in_degree > 5) %>% 
  ggplot(aes(followers_count, in_degree)) + 
  geom_point() 

ego_network %>% 
  as_tibble() %>% 
  mutate(label_name = ifelse(
    test = rank(-followers_count) <= 10 | rank(-in_degree) <= 10, 
    yes = name, 
    no = NA_character_)
    ) %>% 
  ggplot(aes(followers_count, in_degree)) + 
  geom_point() + 
  ggrepel::geom_label_repel(aes(label = label_name), size = 3)

Clusters

clusters <- igraph::cluster_walktrap(graph = ego_network, steps = 7)

cluster_df <- tibble(cluster = factor(clusters$membership), name = clusters$names) 
  
cluster_df <- cluster_df %>% 
  group_by(cluster) %>% 
  filter(n() >= 10) %>% 
  ungroup()
ego_network <- ego_network %>% 
  left_join(cluster_df)

ego_network %>% 
  as_tibble() %>% 
  arrange(desc(in_degree)) %>% 
  filter(!is.na(cluster)) %>% 
  group_by(cluster) %>%
  filter(rank(-authority_score) <= 50) %>% 
  ggplot(aes(label = name, size = log(in_degree), color = in_degree)) + 
  geom_text_wordcloud_area(family = "Avenir Next Condensed") + 
  facet_wrap(~cluster) + 
  labs(title = "Seguidores prominentes en cada cluster") + 
  scale_color_gradient(low = "grey", high = "purple") 

Tamaño de cada cluster:

ego_network %>% as_tibble() %>% count(cluster)
## # A tibble: 3 x 2
##   cluster     n
##   <fct>   <int>
## 1 1         294
## 2 2         254
## 3 <NA>        3

¿Quiénes son los usuarios que funcionan como “puentes”?

ego_network %>% 
  as_tibble() %>% 
  arrange(desc(betweenness)) %>% 
  select(name, description, location)
## # A tibble: 551 x 3
##    name        description                                      location        
##    <chr>       <chr>                                            <chr>           
##  1 AndresCami… "Jefe de Comunicaciones y Prensa del Senador @p… "Bogotá, D.C., …
##  2 AngelicaLo… "Ciudadana, senadora de Colombia 🇨🇴 Partido Ver… ""              
##  3 JairoSoto   "Yo no miento, exagero. Periodista y barranquil… "Bogotá, D.C., …
##  4 sergemont   "Profesor Asociado en Desarrollo Urbano y Regio… "Bogotá, D.C., …
##  5 Elbayabuyi… "Con alma de gordo, intenso, de Sogamoso para e… "Bogotá"        
##  6 ismene2     "Mamerta.🔻"                                     "Colombia"      
##  7 angelamrob… "Psicóloga. Mg. en Política Social. En oposició… "Colombia"      
##  8 ClaudiaLop… "Primera Alcaldesa de Bogotá. Orgullosa bogota…  "Bogotá, DC, Co…
##  9 ALEJOMICHE… "Abogado/Activista DDHH, trabajando por la Prop… "Bogota Colombi…
## 10 MJDuzan     "Periodista"                                     "Colombia"      
## # … with 541 more rows
cols <- c("betweenness", "in_degree", "out_degree", "followers_count", "friends_count")

ego_network %>% 
  as_tibble() %>% 
  group_by(cluster) %>% 
  summarize(across(all_of(cols), mean)) %>% 
  arrange(desc(betweenness))
## # A tibble: 3 x 6
##   cluster betweenness in_degree out_degree followers_count friends_count
##   <fct>         <dbl>     <dbl>      <dbl>           <dbl>         <dbl>
## 1 <NA>           973.      2.67        5              229           216 
## 2 1              674.     55.0        61.2          13224.         1188.
## 3 2              619.     41.7        34.4         166824.         1741.

Subset

ego_network_subset <- ego_network %>% 
  filter(!is.na(cluster)) %>% 
    mutate(
    out_degree = centrality_degree(mode = "out"),
    in_degree = centrality_degree(mode = "in"),
    betweenness = centrality_betweenness(directed = TRUE),
    authority_score = centrality_authority(),
    eigen_centrality = centrality_eigen(directed = TRUE)
  ) 

ego_network_subset %>% 
  ggraph("mds") +
  geom_edge_fan(alpha = 1/5, width = 1/5) + 
  geom_node_point(aes(fill = cluster, size = in_degree), 
                  shape = 21, color = "white", show.legend = FALSE) 

ego_network_subset %>% 
  as_tibble() %>% 
  mutate(label_id = ifelse(
    test = rank(-betweenness) <= 10 |rank(-in_degree) <= 10, 
    yes = name, 
    no = NA_character_)
    ) %>% 
  ggplot(aes(betweenness, in_degree, color = cluster)) +
  geom_point() +
  ggrepel::geom_label_repel(aes(label = label_id), size = 3)

ego_network_subset %>% 
  ggraph("mds") +
  geom_edge_fan(alpha = 1/5, width = 1/5) + 
  geom_node_point(aes(fill = cluster, size = authority_score), 
                  shape = 21, color = "white", show.legend = FALSE) +
  geom_node_label(aes(filter = rank(-authority_score) <= 10 | rank(-betweenness) <= 10, 
                      label = name), 
                  repel = TRUE, alpha = 3/4, size = 3) 

Red de mutuals

Red Personal

edge_list1 <- list.files(paste0(ego, "_friends_of_friends/"), full.names = TRUE) %>% 
  map(read_rds)

error_index <- edge_list1 %>% 
  map_lgl(~ any(class(.x) == "try-error")) %>% 
  which()

edge_list1 <- edge_list1[-error_index] %>% bind_rows()

edge_list2 <- list.files(paste0(ego, "_friends_of_followers/"), full.names = TRUE) %>% 
  map(read_rds)

error_index <- edge_list2 %>% 
  map_lgl(~ any(class(.x) == "try-error")) %>% 
  which()

edge_list2 <- edge_list2[-error_index] %>% bind_rows()

mutual_network <- inner_join(
  edge_list1,
  edge_list2
) %>% 
  filter(from %in% ego_followers$user_id, to %in% ego_followers$user_id) %>% 
  filter(from %in% ego_friends$user_id, to %in% ego_friends$user_id) %>% 
  filter(from %in% to, to %in% from)

mutual_network <- mutual_network %>% 
  mutate(n = 1) %>% 
  tidytext::cast_sparse(from, to, n) %>% 
  graph_from_adjacency_matrix(mode = "undirected") %>% 
  tidygraph::as_tbl_graph() 
ego_mutuals_info <- lookup_users(as_tibble(mutual_network)$name, token = sample(token), 1)

ego_mutuals_info <- ego_mutuals_info %>% 
  filter(!protected) %>% 
  select(
    user_id, screen_name, lang, name, location, description,
    ends_with("count"), -starts_with("quote"), 
    -starts_with("retweet"), -reply_count,
    -starts_with("fav")
    ) %>% 
    rename(name = user_id, user_name = name)

mutual_network <- mutual_network %>% 
  inner_join(ego_mutuals_info) %>% 
  rename(name = screen_name, user_id = name) %>% 
  select(name, everything())

## Estadísticas descriptivas

mutual_network <- mutual_network %>% 
  mutate(
    out_degree = centrality_degree(mode = "out"),
    in_degree = centrality_degree(mode = "in"),
    betweenness = centrality_betweenness(directed = TRUE),
    authority_score = centrality_authority(),
    eigen_centrality = centrality_eigen(directed = TRUE)
  )

La siguiente gráfica muestra la influencia de cada usuario en Twitter (eje horizontal) vs la influencia de cada usuario dentro de la red personal de amigos (eje vertical)

mutual_network %>% 
  as_tibble() %>% 
  #filter(in_degree > 5) %>% 
  ggplot(aes(followers_count, in_degree)) + 
  geom_point() 

mutual_network %>% 
  as_tibble() %>% 
  mutate(label_name = ifelse(
    test = rank(-followers_count) <= 10 | rank(-in_degree) <= 10, 
    yes = name, 
    no = NA_character_)
    ) %>% 
  ggplot(aes(followers_count, in_degree)) + 
  geom_point() + 
  ggrepel::geom_label_repel(aes(label = label_name), size = 3)

Clusters

clusters <- igraph::cluster_walktrap(graph = ego_network, steps = 7)

cluster_df <- tibble(cluster = factor(clusters$membership), name = clusters$names) 
  
cluster_df <- cluster_df %>% 
  group_by(cluster) %>% 
  filter(n() >= 10) %>% 
  ungroup()
mutual_network <- mutual_network %>% 
  left_join(cluster_df)

mutual_network %>% 
  as_tibble() %>% 
  arrange(desc(in_degree)) %>% 
  filter(!is.na(cluster)) %>% 
  group_by(cluster) %>%
  filter(rank(-authority_score) <= 50) %>% 
  ggplot(aes(label = name, size = log(in_degree), color = in_degree)) + 
  geom_text_wordcloud_area(family = "Avenir Next Condensed") + 
  facet_wrap(~cluster) + 
  labs(title = "Seguidores prominentes en cada cluster") + 
  scale_color_gradient(low = "grey", high = "purple") 

Tamaño de cada cluster:

mutual_network %>% as_tibble() %>% count(cluster)
## # A tibble: 3 x 2
##   cluster     n
##   <fct>   <int>
## 1 1         264
## 2 2          97
## 3 <NA>        3

¿Quiénes son los usuarios que funcionan como “puentes”?

mutual_network %>% 
  as_tibble() %>% 
  arrange(desc(betweenness)) 
## # A tibble: 364 x 16
##    name  user_id lang  user_name location description followers_count
##    <chr> <chr>   <chr> <chr>     <chr>    <chr>                 <int>
##  1 mmau… 131959… es    mar       "Colomb… "marica, r…            6529
##  2 Elba… 789209… es    Rodrigo … "Bogotá" "Con alma …           13115
##  3 Lech… 739925… es    SS        "Bogotá… "el del tu…            2062
##  4 Andr… 102769… es    Andrés H… "Bogotá… "Jefe de C…           45250
##  5 miss… 885764… und   La Señor… "Bogotá… "Drag Quee…           22796
##  6 Fede… 144805… es    Fede ⚡️   "Bogotá… "Guaratora…            5077
##  7 carl… 105281… es    Carlos G… "Medell… "disappoin…            8233
##  8 efes… 119873… es    Fabián E… "Bogotá" "Arquitect…              20
##  9 teba… 239268… und   Esteban   ""       "«Y estamo…            4445
## 10 d_ib… 394312… es    Daniel I… "Bogotá… "Mariachi …           13517
## # … with 354 more rows, and 9 more variables: friends_count <int>,
## #   listed_count <int>, statuses_count <int>, out_degree <dbl>,
## #   in_degree <dbl>, betweenness <dbl>, authority_score <dbl>,
## #   eigen_centrality <dbl>, cluster <fct>
cols <- c("betweenness", "in_degree", "out_degree", "followers_count", "friends_count")

mutual_network %>% 
  as_tibble() %>% 
  group_by(cluster) %>% 
  summarize(across(cols, mean)) %>% 
  arrange(desc(betweenness))
## # A tibble: 3 x 6
##   cluster betweenness in_degree out_degree followers_count friends_count
##   <fct>         <dbl>     <dbl>      <dbl>           <dbl>         <dbl>
## 1 1             169.       90.5       90.5           4346.         1242.
## 2 2              85.7      52.6       52.6           2763.         1477.
## 3 <NA>           40.3      21         21              229           216

Subset

mutual_network_subset <- mutual_network %>% 
  filter(!is.na(cluster)) %>% 
    mutate(
    out_degree = centrality_degree(mode = "out"),
    in_degree = centrality_degree(mode = "in"),
    betweenness = centrality_betweenness(directed = TRUE),
    authority_score = centrality_authority(),
    eigen_centrality = centrality_eigen(directed = TRUE)
  ) 

mutual_network_subset %>% 
  ggraph("mds") +
  geom_edge_fan(alpha = 1/5, width = 1/5) + 
  geom_node_point(aes(fill = cluster, size = in_degree), 
                  shape = 21, color = "white", show.legend = FALSE) 

mutual_network_subset %>% 
  as_tibble() %>% 
  mutate(label_id = ifelse(
    test = rank(-betweenness) <= 10 |rank(-in_degree) <= 10, 
    yes = name, 
    no = NA_character_)
    ) %>% 
  ggplot(aes(betweenness, in_degree, color = cluster)) +
  geom_point() +
  ggrepel::geom_label_repel(aes(label = label_id), size = 3)

mutual_network_subset %>% 
  ggraph("mds") +
  geom_edge_fan(alpha = 1/5, width = 1/5) + 
  geom_node_point(aes(fill = cluster, size = authority_score), 
                  shape = 21, color = "white", show.legend = FALSE) +
  geom_node_label(aes(filter = rank(-authority_score) <= 10 | rank(-betweenness) <= 10, 
                      label = name), 
                  repel = TRUE, alpha = 3/4, size = 3) 

mutual_network_subset %>% as_tibble() %>% View

Funciones adicionales

readLines("rtweet_functions.R") %>% 
  writeLines()
## 
## # main functions ----------------------------------------------------------
## 
## multi_get_friends <- function(u, token_list) {
##   
##   user_info <- lookup_users(u, token = sample(token_list, 1)[[1]])
##   fc <- user_info$friends_count
##   message("<<", user_info$screen_name, ">> is following ", scales::comma(fc), " users ")
##   
##   if (user_info$protected) stop(call. = FALSE, "The account is protected, we can't get followers.")
##   
##   num_queries <- ceiling(fc / 5000)
##   rl <- rate_limit(token_list, "get_friends")
##   rl <- validate_rate_limit(rl, "get_friends", token_list)
##   
##   index <- get_available_token_index(rl)
##   
##   # Case 0: User doesn't have any friends
##   
##   if (fc == 0) return(tibble(from = character(0), to = character(0))) 
##   
##   # Case 1: Less than 5,000 friends, only call is needed
##   
##   if (fc <= 5e3) {
##     
##     friends <- get_friends(u, token = token_list[[index]])
##     
##   } else {
##     
##     # Case 2: Many calls are needed
##     
##     output <- vector("list", length = num_queries)
##     output[[1]] <- get_friends(u, token = token_list[[index]])
##     
##     for (i in 2:length(output)) {
##       
##       rl <- validate_rate_limit(rl, "get_friends", token_list)
##       index <- get_available_token_index(rl)
##       output[[i]] <- get_friends(u, token = token_list[[index]], page = next_cursor(output[[i - 1]]))
##       
##     }
##     
##     friends <- bind_rows(output) %>% 
##       distinct()
##     
##   }
##   
##   attr(friends, "next_cursor") <- NULL
##   
##   friends %>% 
##     rename(from = user, to = user_id) %>% 
##     mutate(from = user_info$user_id)
##   
## }
## 
## multi_get_timeline <- function(u, n, token_list, home = FALSE) {
##   
##   message(u)
##   rl <- rate_limit(token_list, "get_timeline")
##   rl <- validate_rate_limit(rl, "get_timeline", token_list)
##   
##   index <- get_available_token_index(rl)
##   
##   # Case 0: User doesn't have any posts
##   
##   # what to do?
##   
##   # Should we allow to get all the timeline??? If so, mimic previous function
##     
##   tl <- get_timeline(u, n = n, home = home, token = token_list[[index]])
## 
##   return(tl)
##   
## }
## 
## # multi_lookup_users <- function() {
## #   
## #   
## # }
## 
## 
## # helpers -----------------------------------------------------------------
## 
## validate_rate_limit <- function(rl, q, token_list) {
##   
##   if (is_empty(rl)) {
##     message("Waiting for rate limiting update")
##     Sys.sleep(60)
##     rl <- rate_limit(token_list, query = q)
##     validate_rate_limit(rl, q, token_list) # recursion!
##     
##   }
##   
##   if (all(rl$remaining == 0)) {
##     
##     message("Waiting for token reset in ", round(min(rl$reset), 1), " minutes")
##     Sys.sleep(min(as.numeric(rl$reset_at - Sys.time(), units = "secs")) + 5)
##     rl <- rate_limit(token_list, query = q)
##     validate_rate_limit(rl, q, token_list) # recursion!
##     
##   }
##   
##   rl
##   
## }
## 
## get_available_token_index <- function(rl) {
##   
##   env <- rlang::caller_env()
##   available_token <- rl$remaining > 0
##   index <- which(available_token)[[1]]
##   env$rl[index, ]$remaining <- rl[index, ]$remaining - 1  # this modifies the rl obj in the parent frame
##   return(index)
##   
## }
theme_custom
## function (base_family = "Avenir Next Condensed", fill = "white", ...) {
##     theme_minimal(base_family = base_family, ...) %+replace% 
##         theme(plot.title = element_text(face = "bold", margin = margin(0, 
##             0, 5, 0), hjust = 0, size = 13), plot.subtitle = element_text(face = "italic", 
##             margin = margin(0, 0, 5, 0), hjust = 0), plot.background = element_rect(fill = fill, 
##             size = 0), complete = TRUE, axis.title.x = element_text(margin = margin(15, 
##             0, 0, 0)), axis.title.y = element_text(angle = 90, 
##             margin = margin(0, 20, 0, 0)), strip.text = element_text(face = "italic", 
##             colour = "white"), strip.background = element_rect(fill = "#4C4C4C"))
## }